74 research outputs found

    Exposing the Limitations of Molecular Machine Learning with Activity Cliffs

    Get PDF
    Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high levels of accuracy. However, activity cliffs – pairs of molecules that are highly similar in their structure but exhibit large differences in potency – have been underinvestigated for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization, but models that are well-equipped to accurately predict the potency of activity cliffs have an increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked more than 20 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. These results advocate for (a) the inclusion of dedicated “activity-cliff-centered” metrics during model development and evaluation, and (b) the development of novel algorithms to better predict the properties of activity cliff. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community towards addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs. This data deposit contains all trained models and the data used to train them. All models can be easily loaded and used to predict bioactivity on new molecules with MoleculeACE. Since models are target-specific, models are provided for all 30 data sets. Every model is accompanied by a configure file that describes its (optimized) hyperparameters

    Structure-based drug discovery with deep learning

    Get PDF
    Artificial intelligence (AI) in the form of deep learning bears promise for drug discovery and chemical biology, e.g.\textit{e.g.}, to predict protein structure and molecular bioactivity, plan organic synthesis, and design molecules de novo\textit{de novo}. While most of the deep learning efforts in drug discovery have focused on ligand-based approaches, structure-based drug discovery has the potential to tackle unsolved challenges, such as affinity prediction for unexplored protein targets, binding-mechanism elucidation, and the rationalization of related chemical kinetic properties. Advances in deep learning methodologies and the availability of accurate predictions for protein tertiary structure advocate for a renaissance\textit{renaissance} in structure-based approaches for drug discovery guided by AI. This review summarizes the most prominent algorithmic concepts in structure-based deep learning for drug discovery, and forecasts opportunities, applications, and challenges ahead

    A QSTR-based expert system to predict sweetness of molecules

    Get PDF
    This work describes a novel approach based on advanced molecular similarity to predict the sweetness of chemicals. The proposed Quantitative Structure-Taste Relationship (QSTR) model is an expert system developed keeping in mind the five principles defined by the Organization for Economic Co-operation and Development (OECD) for the validation of (Q)SARs. The 649 sweet and non-sweet molecules were described by both conformation-independent extended-connectivity fingerprints (ECFPs) and molecular descriptors. In particular, the molecular similarity in the ECFPs space showed a clear association with molecular taste and it was exploited for model development. Molecules laying in the subspaces where the taste assignation was more difficult were modeled trough a consensus between linear and local approaches (Partial Least Squares-Discriminant Analysis and N-nearest-neighbor classifier). The expert system, which was thoroughly validated through a Monte Carlo procedure and an external set, gave satisfactory results in comparison with the state-of-the-art models. Moreover, the QSTR model can be leveraged into a greater understanding of the relationship between molecular structure and sweetness, and into the design of novel sweeteners.Instituto de Investigaciones FisicoquĂ­micas TeĂłricas y AplicadasFacultad de Ciencias Exacta

    The cerium content of the Milky Way as revealed by Gaia DR3 GSP-Spec abundances

    Get PDF
    [Abstract]: The recent Gaia third data release contains a homogeneous analysis of millions of high-quality Radial Velocity Spectrometer (RVS) stellar spectra by the GSP-Spec module. This led to the estimation of millions of individual chemical abundances and allows us to chemically map the Milky Way. The published GSP-Spec abundances include three heavy elements produced by neutron-captures in stellar interiors: Ce, Zr, and Nd. Aims. We study the Galactic content in cerium based on these Gaia/RVS data and discuss the chemical evolution of this element. Methods. We used a sample of about 30 000 local thermal equilibrium Ce abundances, selected after applying different combinations of GSP-Spec flags. Based on the Gaia DR3 astrometric data and radial velocities, we explore the cerium content in the Milky Way and, in particular, in its halo and disc components. Results. The high quality of the Ce GSP-Spec abundances is quantified through literature comparisons. We found a rather flat [Ce/Fe] versus [M/H] trend. We also found a flat radial gradient in the disc derived from field stars and, independently, from about 50 open clusters. This agrees with previous studies. The [Ce/Fe] vertical gradient was also estimated. We also report an increasing [Ce/Ca] versus [Ca/H] in the disc, illustrating the late contribution of asymptotic giant branch stars with respect to supernovae of type II. Our cerium abundances in the disc, including the young massive population, are well reproduced by a new three-infall chemical evolution model. In the halo population, the M 4 globular cluster is found to be enriched in cerium. Moreover, 11 stars with cerium abundances belonging to the Thamnos, Helmi Stream, and Gaia-Sausage-Enceladus accreted systems were identified from chemo-dynamical diagnostics. We found that the Helmi Stream might be slightly underabundant in cerium compared to the two other systems. Conclusions. This work illustrates the high quality of the GSP-Spec chemical abundances, which significantly contribute to unveiling the heavy-element evolution history of the Milky Way.We thank the referee for their valuable comments. ES received funding from the European Union’s Horizon 2020 research and innovation program under SPACE-H2020 grant agreement number 101004214 (EXPLORE project). ARB also acknowledges support from this Horizon program. PAP and EP thanks the Centre National d’Etudes Spatiales (CNES) for funding support. VG acknowledges support from the European Research Council Consolidator Grant funding scheme (project ASTEROCHRONOMETRY, G.A. n. 772293, http://www.asterochronometry.eu ). Special thanks to Niels Nieuwmunster and Botebar for grateful comments on figures. This work has made use of data from the European Space Agency (ESA) mission Gaia ( https://www.cosmos.esa.int/gaia ), processed by the Gaia Data Processing and Analysis Consortium (DPAC, https://www.cosmos.esa.int/web/gaia/dpac/consortium ). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement

    A QSTR-based expert system to predict sweetness of molecules

    Get PDF
    This work describes a novel approach based on advanced molecular similarity to predict the sweetness of chemicals. The proposed Quantitative Structure-Taste Relationship (QSTR) model is an expert system developed keeping in mind the five principles defined by the Organization for Economic Co-operation and Development (OECD) for the validation of (Q)SARs. The 649 sweet and non-sweet molecules were described by both conformation-independent extended-connectivity fingerprints (ECFPs) and molecular descriptors. In particular, the molecular similarity in the ECFPs space showed a clear association with molecular taste and it was exploited for model development. Molecules laying in the subspaces where the taste assignation was more difficult were modeled trough a consensus between linear and local approaches (Partial Least Squares-Discriminant Analysis and N-nearest-neighbor classifier). The expert system, which was thoroughly validated through a Monte Carlo procedure and an external set, gave satisfactory results in comparison with the state-of-the-art models. Moreover, the QSTR model can be leveraged into a greater understanding of the relationship between molecular structure and sweetness, and into the design of novel sweeteners.Instituto de Investigaciones FisicoquĂ­micas TeĂłricas y AplicadasFacultad de Ciencias Exacta

    Engineering cytokine therapeutics

    Get PDF
    Cytokines have pivotal roles in immunity, making them attractive as therapeutics for a variety of immune-related disorders. However, the widespread clinical use of cytokines has been limited by their short blood half-lives and severe side effects caused by low specificity and unfavourable biodistribution. Innovations in bioengineering have aided in advancing our knowledge of cytokine biology and yielded new technologies for cytokine engineering. In this Review, we discuss how the development of bioanalytical methods, such as sequencing and high-resolution imaging combined with genetic techniques, have facilitated a better understanding of cytokine biology. We then present an overview of therapeutics arising from cytokine re-engineering, targeting and delivery, mRNA therapeutics and cell therapy. We also highlight the application of these strategies to adjust the immunological imbalance in different immune-mediated disorders, including cancer, infection and autoimmune diseases. Finally, we look ahead to the hurdles that must be overcome before cytokine therapeutics can live up to their full potential
    • 

    corecore